Skip to content

fix: slow start config add min_weight_percent field to avoid too big edf deadline #19712

Merged
yanavlasov merged 16 commits intoenvoyproxy:mainfrom
jiangshantao-dbg:fix/slow-start-formula-result-too-big-edf-dealine
Mar 10, 2022
Merged

fix: slow start config add min_weight_percent field to avoid too big edf deadline #19712
yanavlasov merged 16 commits intoenvoyproxy:mainfrom
jiangshantao-dbg:fix/slow-start-formula-result-too-big-edf-dealine

Conversation

@jiangshantao-dbg
Copy link
Copy Markdown
Contributor

PR to fix #19526

Commit Message:
cluster SlowStartConfig add min_weight_percent field to avoid too big edf deadline. set default value 0.1.

Additional Description:
Risk Level: Low
Testing: Done
Docs Changes: Done
Release Notes:Done
Fixes #19526

@repokitteh-read-only
Copy link
Copy Markdown

Hi @jiangshantao-dbg, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #19712 was opened by jiangshantao-dbg.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #19712 was opened by jiangshantao-dbg.

see: more, trace.

@jiangshantao-dbg jiangshantao-dbg force-pushed the fix/slow-start-formula-result-too-big-edf-dealine branch 2 times, most recently from d99d9be to eb5a7e5 Compare January 27, 2022 06:58
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api
Looked briefly at the code, and left a comment.
Consider adding tests for the minimal (0.0) and maximal (100.0) values.
/wait

Comment on lines 789 to 790
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this will be checked by the PGV validator (you added a constraint to the field that automatically checks that). Also this doesn't take into account equals on both ends.
Better to remove the checks and let PGV validate that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if remove the PGV validator for this field, just use the constructor to init the value. It will throw an exception if the PGV failed, which cause the cluster can't be adding to manager. I think the min_weight_percent should not affect the cluster update, so just the warning message and use the default value 0.1 is enough ?

Also the min_weight_percent can't be 0 which will cause a really big edf deadline, and we want use the default value if the user not specify a value. so the constraint in proto use gte: 0.0(pb default value 0.0) and the constructor use > 0.0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PGV is meant to detect problems early and reject configs that are incorrect, so I encourage using that.

If min_weight_percent must be greater than 0.0, why not use gt instead of gte? Same goes for the upper bound (see here for more info)

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PING?
/wait

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, these days i am in a vacation. i will use {lt: 1.0 gt: 0.0 ignore_empty: true} to the field min_weight_percent, which allows users has not to specify the default value. and will keep checking logic in constructor to ensure the default value to 0.1.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if PGV rejects the config (due to wrong value), this code will never be reached.

@adisuissa
Copy link
Copy Markdown
Contributor

Please let me know what you think about changing the PGV constraint (using lt and gt).
/wait

jiangshantao added 10 commits February 7, 2022 11:26
…edf deadline

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
@jiangshantao-dbg jiangshantao-dbg force-pushed the fix/slow-start-formula-result-too-big-edf-dealine branch from 532a2d2 to 929bd47 Compare February 7, 2022 03:36
Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19712 (comment) was created by @jiangshantao-dbg.

see: more, trace.

@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

Please let me know what you think about changing the PGV constraint (using lt and gt). /wait

@adisuissa sorry, i rebase the branch which cause the reviewing processing reset. i have completed the changes, and all test cases are ok. please have a look.

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.
Looked again at the API style guide, and left a comment to what I think should be changed here.
/wait

// Configures the minimum percentage of origin weight that avoids too small new weight,
// which may cause endpoints in slow start mode receive no traffic in slow start window.
// Range between (0.0, 1.0). If not specified, the default is 0.1.
double min_weight_percent = 3 [(validate.rules).double = {lt: 1.0 gt: 0.0 ignore_empty: true}];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type should be changed to Percent (see api style).
Also note that this implies that the range should be (0, 100).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to use a double wrapper (as the default value isn't 0), and then use PROTOBUF_GET_WRAPPED_OR_DEFAULT to set the value in the c'tor.

Comment on lines +787 to +798
min_weight_percent_ = 0.1;
if (slow_start_config.has_value()) {
if (slow_start_config.value().min_weight_percent() > 0.0 &&
slow_start_config.value().min_weight_percent() < 1.0) {
min_weight_percent_ = slow_start_config.value().min_weight_percent();
} else {
ENVOY_LOG(warn,
"Invalid value {} provided for min_weight_percent parameter, must be in range "
"(0.0, 1.0), so use default value 0.1",
slow_start_config.value().min_weight_percent());
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the field is of type Percent, you can initialize the min_weight_percent_ field in the MIL, and use PROTOBUF_PERCENT_TO_ROUNDED_INTEGER_OR_DEFAULT

Something similar to:

PROTOBUF_PERCENT_TO_DOUBLE_OR_DEFAULT(
         slow_start_config, min_weight_percent, 10))/100.0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for advice. i have change to use Percent. please have a look.

@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

jiangshantao-dbg commented Feb 15, 2022

This is the fix suggested in #19526, so the approach LGTM. I had a few nits in the comments, but overall the docs/code are great.

One thing unrelated to the code that I would suggest is to elaborate on the problem this PR is solving. The release note doesn't make it clear why this was necessary and it would just make it easier for anyone who's curious to quickly get an answer.

Thank you for doing this!

Thank for you advise. i has fix code/docs as the comments suggests. please have a look. @tonya11en

@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19712 (comment) was created by @jiangshantao-dbg.

see: more, trace.

tonya11en
tonya11en previously approved these changes Feb 21, 2022
Copy link
Copy Markdown
Member

@tonya11en tonya11en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@rojkov
Copy link
Copy Markdown
Member

rojkov commented Feb 28, 2022

/wait
on merge conflict

@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

/wait on merge conflict

@rojkov conflict resolved

@rojkov
Copy link
Copy Markdown
Member

rojkov commented Feb 28, 2022

Sorry, you might need to merge the main branch again as it had been broken for some time.

…t-too-big-edf-dealine

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

Sorry, you might need to merge the main branch again as it had been broken for some time.

@rojkov done

@rojkov
Copy link
Copy Markdown
Member

rojkov commented Mar 4, 2022

@tonya11en could you reinstate your approval?

tonya11en
tonya11en previously approved these changes Mar 4, 2022
@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

@adisuissa could you review this pr again. there is 1 workflow awaiting approval. thanks

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
/lgtm api
Left a couple of nits, but otherwise LGTM.

return std::chrono::time_point_cast<std::chrono::milliseconds>(edf_lb.latest_host_added_time_)
.time_since_epoch();
}
static double slowStartMinWeightPercent(EdfLoadBalancerBase& edf_lb) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
static double slowStartMinWeightPercent(EdfLoadBalancerBase& edf_lb) {
static double slowStartMinWeightPercent(const EdfLoadBalancerBase& edf_lb) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const absl::optional<Runtime::Double> aggression_runtime_;
TimeSource& time_source_;
MonotonicTime latest_host_added_time_;
double slow_start_min_weight_percent_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
@jiangshantao-dbg
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19712 (comment) was created by @jiangshantao-dbg.

see: more, trace.

@adisuissa
Copy link
Copy Markdown
Contributor

/assign-from @envoyproxy/senior-maintainers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/senior-maintainers assignee is @yanavlasov

🐱

Caused by: a #19712 (comment) was created by @adisuissa.

see: more, trace.

@yanavlasov yanavlasov merged commit e3fede5 into envoyproxy:main Mar 10, 2022
jiangshantao-dbg added a commit to istio-mt/envoy that referenced this pull request May 10, 2022
…edf deadline (envoyproxy#19712)

* fix: slow start config add min_weight_percent field to avoid too big edf deadline

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
jiangshantao-dbg added a commit to istio-mt/envoy that referenced this pull request May 16, 2022
* fix: slow start config add min_weight_percent field to avoid too big edf deadline  (envoyproxy#19712)

* fix: slow start config add min_weight_percent field to avoid too big edf deadline

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>

* fix: ./tools/proto_format/proto_format.sh fix generate proto

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>

Co-authored-by: jiangshantao <jiangshantao-dbg@qq.com>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
…edf deadline (envoyproxy#19712)

* fix: slow start config add min_weight_percent field to avoid too big edf deadline

Signed-off-by: jiangshantao <jiangshantao-dbg@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

slow start mode formula result very big edf dealine_ cause slow start invalid

5 participants